-
Notifications
You must be signed in to change notification settings - Fork 24
ITEP-32416 Add FP16 inference with feature flag #233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a feature flag to toggle FP16 inference and updates model creation, selection logic, and tests to support FP16-first pipeline with FP32 fallback.
- Introduces
FEATURE_FLAG_FP16_INFERENCE
and uses it inprepare_train
andModelRepo
to choose precision order. - Renames
mo_fp32_with_xai
→mo_with_xai
across code, fixtures, and tests. - Updates
ModelRepo.get_latest_model_for_inference*
to fetch all matching precisions and implement FP16/FP32 fallback.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
interactive_ai/workflows/geti_domain/train/job/tasks/prepare_and_train/train_helpers.py | Use feature flag to set FP16 or FP32 for XAI model |
interactive_ai/workflows/geti_domain/train/job/tasks/evaluate_and_infer/evaluate_and_infer.py | Swap mo_fp32_with_xai references to mo_with_xai |
interactive_ai/workflows/geti_domain/common/jobs_common/features/feature_flag_provider.py | Add FEATURE_FLAG_FP16_INFERENCE enum entry |
interactive_ai/workflows/geti_domain/common/jobs_common_extras/mlflow/utils/train_output_models.py | Rename mo_fp32_with_xai → mo_with_xai in IDs/parse |
interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py | Update inference query to include both precisions and implement fallback logic |
Tests and fixtures (multiple files) | Rename fields/tests for mo_with_xai and cover both flag states |
Comments suppressed due to low confidence (2)
interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py:450
- Aggregation pipeline lacks a
$sort
stage to ensure the latest model is returned first; this can lead to selecting an older model ID when multiple precisions exist—add sorting by version or_id
before projecting.
matched_docs = list(self.aggregate_read(aggr_pipeline))
interactive_ai/libs/iai_core_py/tests/repos/test_model_repo.py:431
- [nitpick] Update this docstring to reflect the new FP16-first behavior when the feature flag is enabled, e.g. mention M6_FP16 as expected under
fp16-enabled
.
The latest model for inference is M4 (the first one generated after the base model).
…hen fetching models + add unit test for fallback model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
The PR adds a new feature flag (FEATURE_FLAG_FP16_INFERENCE) to support FP16 inference and updates model selection logic, renaming the optimized model field from mo_fp32_with_xai to mo_with_xai throughout the code and tests. Key changes include:
- Adding FP16 feature flag support in feature flag services and enum.
- Updating model creation and selection logic to conditionally use FP16 based on the flag.
- Refactoring tests and fixture data to reflect the renamed model field and validate FP16 and FP32 scenarios.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
web_ui/src/core/feature-flags/services/feature-flag-service.interface.ts | Added FEATURE_FLAG_FP16_INFERENCE flag to the development features |
interactive_ai/workflows/geti_domain/train/tests/unit/workflows/test_train_workflow.py | Updated reference to mo_with_xai in tests after renaming |
interactive_ai/workflows/geti_domain/train/tests/unit/tasks/prepare_and_train/test_train_helpers.py | Added parameterized tests for feature flag-driven precision selection |
interactive_ai/workflows/geti_domain/train/tests/unit/tasks/evaluate_and_infer/test_evaluate_and_infer.py | Updated model field references in evaluation/inference tests |
interactive_ai/workflows/geti_domain/train/tests/fixtures/train_workflow_data.py | Updated fixture to use the new model field name mo_with_xai |
interactive_ai/workflows/geti_domain/train/job/tasks/prepare_and_train/train_helpers.py | Modified model builder creation to choose FP16 or FP32 based on the feature flag |
interactive_ai/workflows/geti_domain/train/job/tasks/evaluate_and_infer/evaluate_and_infer.py | Updated inference tasks to reference the new model field naming |
interactive_ai/workflows/geti_domain/common/jobs_common_extras/mlflow/utils/train_output_models.py | Refactored TrainOutputModelIds and TrainOutputModels to use mo_with_xai |
interactive_ai/workflows/geti_domain/common/jobs_common/features/feature_flag_provider.py | Added FEATURE_FLAG_FP16_INFERENCE to the enumerated flags |
interactive_ai/libs/iai_core_py/tests/repos/test_model_repo.py | Revised tests for the model repository to validate FP16/FP32 selection logic |
interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py | Updated repository queries and aggregation pipelines to prioritize FP16 via feature flag |
Comments suppressed due to low confidence (1)
interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py:398
- [nitpick] Consider verifying that sorting by _id ascending reliably reflects the creation order when selecting the earliest model for inference; if _id does not guarantee chronological order, you might sort using a dedicated timestamp field for better clarity.
"precision": {"$in": [ModelPrecision.FP16.name, ModelPrecision.FP32.name]},
📝 Description
This PR introduces
FEATURE_FLAG_FP16_INFERENCE
to control which model precision is used for inference operations. The change allows for more efficient resource utilization while maintaining backward compatibility.Changes:
FEATURE_FLAG_FP16_INFERENCE
to control model precision selectionDetails
When enabled, the system will:
This change optimizes resource utilization by deploying more compact FP16 models that consume less memory and storage while maintaining inference performance.
JIRA: ITEP-32416 ITEP-66504 ITEP-66505
✨ Type of Change
Select the type of change your PR introduces:
🧪 Testing Scenarios
Describe how the changes were tested and how reviewers can test them too:
✅ Checklist
Before submitting the PR, ensure the following: